22 research outputs found

    Big Data Optimization : Algorithmic Framework for Data Analysis Guided by Semantics

    Get PDF
    Fecha de Lectura de Tesis: 9 noviembre 2018.Over the past decade the rapid rise of creating data in all domains of knowledge such as traffic, medicine, social network, industry, etc., has highlighted the need for enhancing the process of analyzing large data volumes, in order to be able to manage them with more easiness and in addition, discover new relationships which are hidden in them Optimization problems, which are commonly found in current industry, are not unrelated to this trend, therefore Multi-Objective Optimization Algorithms (MOA) should bear in mind this new scenario. This means that, MOAs have to deal with problems, which have either various data sources (typically streaming) of huge amount of data. Indeed these features, in particular, are found in Dynamic Multi-Objective Problems (DMOPs), which are related to Big Data optimization problems. Mostly with regards to velocity and variability. When dealing with DMOPs, whenever there exist changes in the environment that affect the solutions of the problem (i.e., the Pareto set, the Pareto front, or both), therefore in the fitness landscape, the optimization algorithm must react to adapt the search to the new features of the problem. Big Data analytics are long and complex processes therefore, with the aim of simplify them, a series of steps are carried out through. A typical analysis is composed of data collection, data manipulation, data analysis and finally result visualization. In the process of creating a Big Data workflow the analyst should bear in mind the semantics involving the problem domain knowledge and its data. Ontology is the standard way for describing the knowledge about a domain. As a global target of this PhD Thesis, we are interested in investigating the use of the semantic in the process of Big Data analysis, not only focused on machine learning analysis, but also in optimization

    Scalable Inference of Gene Regulatory Networks with the Spark Distributed Computing Platform Cristo

    Get PDF
    Inference of Gene Regulatory Networks (GRNs) remains an important open challenge in computational biology. The goal of bio-model inference is to, based on time-series of gene expression data, obtain the sparse topological structure and the parameters that quantitatively understand and reproduce the dynamics of biological system. Nevertheless, the inference of a GRN is a complex optimization problem that involve processing S-System models, which include large amount of gene expression data from hundreds (even thousands) of genes in multiple time-series (essays). This complexity, along with the amount of data managed, make the inference of GRNs to be a computationally expensive task. Therefore, the genera- tion of parallel algorithmic proposals that operate efficiently on distributed processing platforms is a must in current reconstruction of GRNs. In this paper, a parallel multi-objective approach is proposed for the optimal inference of GRNs, since min- imizing the Mean Squared Error using S-System model and Topology Regularization value. A flexible and robust multi-objective cellular evolutionary algorithm is adapted to deploy parallel tasks, in form of Spark jobs. The proposed approach has been developed using the framework jMetal, so in order to perform parallel computation, we use Spark on a cluster of distributed nodes to evaluate candidate solutions modeling the interactions of genes in biological networks.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    KNIT: Ontology reusability through knowledge graph exploration

    Get PDF
    Ontologies have become a standard for knowledge representation across several domains. In Life Sciences, numerous ontologies have been introduced to represent human knowledge, often providing overlapping or conflicting perspectives. These ontologies are usually published as OWL or OBO, and are often registered in open repositories, e.g., BioPortal. However, the task of finding the concepts (classes and their properties) defined in the existing ontologies and the relationships between these concepts across different ontologies – for example, for developing a new ontology aligned with the existing ones – requires a great deal of manual effort in searching through the public repositories for candidate ontologies and their entities. In this work, we develop a new tool, KNIT, to automatically explore open repositories to help users fetch the previously designed concepts using keywords. User-specified keywords are then used to retrieve matching names of classes or properties. KNIT then creates a draft knowledge graph populated with the concepts and relationships retrieved from the existing ontologies. Furthermore, following the process of ontology learning, our tool refines this first draft of an ontology. We present three BioPortal-specific use cases for our tool. These use cases outline the development of new knowledge graphs and ontologies in the sub-domains of biology: genes and diseases, virome and drugs.This work has been funded by grant PID2020-112540RB-C4121, AETHER-UMA (A smart data holistic approach for context-aware data analytics: semantics and context exploitation). Funding for open access charge: Universidad de Málaga / CBUA

    Injecting domain knowledge in multi-objective optimization problems: A semantic approach

    Get PDF
    In the field of complex problem optimization with me-taheuristics, semantics has been used for modeling different aspects, such as: problem characterization, parameters, decision-maker's preferences, or algorithms. However, there is a lack of approaches where ontologies are ap-plied in a direct way into the optimization process, with the aim of enhancing it by allowing the systematic incorporation of additional domain knowledge. This is due to the high level of abstraction of ontologies, which makes them difficult to be mapped into the code implementing the problems and/or the specific operators of metaheuristics. In this paper, we present a strategy to inject domain knowledge (by reusing existing ontologies or creating a new one) into a problem implementation that will be optimized using a metaheu-ristic. Thus, this approach based on accepted ontologies enables building and exploiting complex computing systems in optimization problems. We describe a methodology to automatically induce user choices (taken from the ontology) into the problem implementations provided by the jMetal op-timization framework. With the aim of illustrating our proposal, we focus on the urban domain. Concretely, We start from defining an ontology repre-senting the domain semantics for a city (e.g., building, bridges, point of inte-rest, routes, etc.) that allows defining a-priori preferences by a decision ma-ker in a standard, reusable, and formal (logic-based) way. We validate our proposal with several instances of two use cases, consisting in bi-objective formulations of the Traveling Salesman Problem (TSP) and the Radio Net-work Design problem (RND), both in the context of an urban scenario. The results of the experiments conducted show how the semantic specification of domain constraints are effectively mapped into feasible solutions of the tackled TSP and RND scenarios. TUniversidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Algoritmo Evolutivo Multi-Objectivo para la Toma de Decisiones Interactiva en Optimización Dinámica

    Get PDF
    Debido al creciente interés en el análisis de datos en streaming en entornos Big Data para la toma de decisiones, cada vez es más común la aparición de problemas de optimización dinámica que involucran dos o más objetivos en conflicto. Sin embargo, los enfoques que combinan optimización dinámica multi-objetivo con la articulación de preferencias para la toma de decisiones son todavía escasos. En este artículo, proponemos un nuevo algoritmo de optimización dinámica multi-objetivo llamado InDM2, que permite incorporar preferencias del experto (humano) de cara a la toma de decisiones para guiar el proceso de búsqueda. Con InDM2, el decisor no solo puede expresar sus preferencias mediante uno o más puntos de referencia (que definen la la región de interés deseada), sino que estos puntos también se pueden modificar de manera interactiva. La propuesta incorpora métodos para mostrar gráficamente las diferentes aproximaciones de la región de interés obtenidas durante el proceso de optimización. El decisor puede así inspeccionar y cambiar, en tiempo de optimización, la región de interés de acuerdo con la información mostrada. Las principales características de InDM2 son descritas y se analiza su funcionamiento mediante casos de uso académicos.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Un Framework para Big Data Optimization Basado en jMetal y Spark

    Get PDF
    Las metaheurísticas multi-objetivo se han convertido en técnicas muy utilizadas para la resolución de problemas complejos de optimización compuestos de varias funciones objetivo en conflicto entre sí. Nos encontramos en la actualidad inmersos en la era del Big Data, por lo que los problemas multi-objetivo que surjan en este contexto cumplirán algunas de las cinco V’s que caracterizan a las aplicaciones Big Data (volumen, velocidad, variedad, veracidad, valor). Como consecuencia, las metaheurísticas deberán ser capaces de resolver problemas dinámicos, que pueden cambiar en el tiempo debido al procesamiento y análisis de diferentes fuentes de datos, que típicamente serán en streaming. En este trabajo presentamos el software jMetalSP, que combina el framework jMetal con Apache Spark. De esta forma, las metaheurísticas disponibles en jMetal se pueden adaptar fácilmente para resolver problemas dinámicos que se alimenten de distintas fuentes de datos en streaming, y que son gestionadas por Spark. Se describe la arquitectura de jMetalSP y se valida mediante un caso de uso realista basado en TSP bi-objetivo con datos abiertos reales de tráfico de la ciudad de Nueva York.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Artificial intelligence for automatically detecting animals in camera trap images: a combination of MegaDetector and YOLOv5

    Get PDF
    Camera traps have gained high popularity for collecting animal images in a cost-effective and non-invasive manner, but manually examining these large volumes of images to extract valuable data is a laborious and costly process. Deep learning, specifically object detection techniques, constitutes a powerful tool for automating this task. Here, we describe the development and result of a deep-learning workflow based on MegaDetector and YOLOv5 for automatically detecting animals in camera trap images. For the development, we first used MegaDetector, which automatically generated bounding boxes for 93.2% of the images in the training set, differentiating animals, humans, vehicles, and empty photos. This annotation phase allowed to discard useless images. Then, we used the images containing animals within the training dataset to train four YOLOv5 models, each one built for a group of species of similar aspects as defined by a human expert. Using four expert models instead of one reduces the complexity and variance between species, allowing for more precise learning within each of the groups. The final result is a workflow where the end-user enters the camera trap images into a global model. Then, this global model redirects the images towards the appropriate expert model. Finally, the final animal classification into a particular species is based on the confidence rates provided by a weighted voting system implemented among the expert models. We validated this workflow using a dataset of 120.000 images collected by 100 camera traps over five years in Andalusian National Parks (Spain) with a representation of 24 mammal species. Our workflow approach improved the global classification F1-score from 0.92 to 0.96. It increased the precision for distinguishing similar species, for example from 0.41 to 0.96 for C. capreolus; and from 0.24 to 0.73 for D. dama, often confounded with other ungulate species, which demonstrates its potential for animal detection in images.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    TITAN: A knowledge-based platform for Big Data workflow management

    Get PDF
    Modern applications of Big Data are transcending from being scalable solutions of data processing and analysis, to now provide advanced functionalities with the ability to exploit and understand the underpinning knowledge. This change is promoting the development of tools in the intersection of data processing, data analysis, knowledge extraction and management. In this paper, we propose TITAN, a software platform for managing all the life cycle of science workflows from deployment to execution in the context of Big Data applications. This platform is characterised by a design and operation mode driven by semantics at different levels: data sources, problem domain and workflow components. The proposed platform is developed upon an ontological framework of meta-data consistently managing processes and models and taking advantage of domain knowledge. TITAN comprises a well-grounded stack of Big Data technologies including Apache Kafka for inter-component communication, Apache Avro for data serialisation and Apache Spark for data analytics. A series of use cases are conducted for validation, which comprises workflow composition and semantic meta-data management in academic and real-world fields of human activity recognition and land use monitoring from satellite images.Universidad de Málaga. Andalucía TECH

    Using TITAN in Life Sciences

    No full text
    TITAN is a software platform for managing workflows from deployment to execution in the context of Big Data applications. This platform is characterised by a design and operation mode driven by semantics at different levels: data sources, problem domain and workflow components. The proposed platform uses ontologies as the core element for meta-data management. TITAN used Big Data technologies in its architecture. Thus, Apache Kafka is used for inter-component communication, Apache Avro for data serialisation and Apache Spark for data analytics. This project is being used in the EnBiC2-Lab Environmental and Biodiversity Climate Change Lab) project as part of the LifeWatch ERIC ecosystem. This project addresses the challenge of creating a set of databases, tools and a Virtual Research Environment (VRE) to monitor and analyse the effects of Climate Change in a comprehensive way, through the integration of measures and results from five different perspectives: water, air, soil, fauna and flora. Thus, TITAN will be made available for the LifeWatch community as the Big Data VRE

    A Fine Grain Sentiment Analysis with Semantics in Tweets

    No full text
    Social networking is nowadays a major source of new information in the world. Microblogging sites like Twitter have millions of active users (320 million active users on Twitter on the 30th September 2015) who share their opinions in real time, generating huge amounts of data. These data are, in most cases, available to any network user. The opinions of Twitter users have become something that companies and other organisations study to see whether or not their users like the products or services they offer. One way to assess opinions on Twitter is classifying the sentiment of the tweets as positive or negative. However, this process is usually done at a coarse grain level and the tweets are classified as positive or negative. However, tweets can be partially positive and negative at the same time, referring to different entities. As a result, general approaches usually classify these tweets as “neutral”. In this paper, we propose a semantic analysis of tweets, using Natural Language Processing to classify the sentiment with regards to the entities mentioned in each tweet. We offer a combination of Big Data tools (under the Apache Hadoop framework) and sentiment analysis using RDF graphs supporting the study of the tweet’s lexicon. This work has been empirically validated using a sporting event, the 2014 Phillips 66 Big 12 Men’s Basketball Championship. The experimental results show a clear correlation between the predicted sentiments with specific events during the championship.Ministerio de Ciencia e Innovación TIN2014-58304-RJunta de Andalucía P11-TIC-7529Junta de Andalucía P12- TIC-151
    corecore